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We consider the problem of statistical inference for the S distribution 
and introduce new minimum distance estimators for the four parame- 
ters of the S distribution using Kolmogorov-Smirnov, Cramer-von Mises 
and related distance metrics. Approximate goodness-of-fit and confidence 
intervals for parameters are calculated using bootstrap methods. We dis- 
cuss further how the S distribution can be used to solve various prob- 
lems of statistical modeling associated with parameter inference, includ- 
ing goodness-of-fit tests, Monte Carlo simulations and modeling trends in 
the distributions. 
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1 Introduction 

Parameter estimation is important for solving various problems associated 
with statistical inference, e.g. for hypothesis testing and for Monte Carlo 
and stochastic modeling. In practical terms, the very first step in such 
modeling involves choosing a distribution function (d.f.) for the proba- 
bility law that best describes the random process and/or available data. 
Very often one is presented with a poorly understood process to model 
and with samples of data of realistic (moderate) size, and then this choice 
can be far from unique. In other words, several distributions in common 
use (e.g., Normal, Logistic, Weibull, Laplace etc.) often can be used with 
almost equal success. This difficulty can be alleviated by using distribu- 
tional families. Such families however tend to be unwieldy mathematically 
and even present serious compuational difficulties, for example, poles in 
the Pearson system of distributions. 
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Abstract 



A useful approach for these problems has been developed over the 
past ten years. It involves a univariate c ontinuous f our-p arameter distri- 
butional family called the S distribution ( Savageau , 1982[ ) that is capable 
not only of approximating ma ny central a nd noncentral unimodal univari- 
ate distributions rather well (Voit, 1992), but also of representin g an un- 
countable multitude of others as its parameters change smoothly ( [gorribas 



et al 



2000). It includes Exponential, Logistic, Uniform and Linear distri- 
butions as special parametric cases. The S distribution derives its name 
from the fa ct that it is based on the theory of S-systems ( Savageau , 



1976 



Voit, 1991 ). The versatility and relative mathematical simplicity of the 
S distribution prompts for its use in statistical inference problems. We 
note here that the errors resulting from approximation seem to be a small 
price relative to the advantage of having a "best fit" distribution readily 
available for a particular problem. A number of parameter estimation 
techniques have been proposed over the years (reviewed briefly below). 

In this article we propose minimum distance (MD) estimators for the 
S distribution parameters that make use of goodness-of-fit statistics of 
supremum (Kolmogorov-Smirnov and Kuiper) and quadratic (Cramer- 
von Mises and Watson) types as distance metrics between empirical d.f. 
and S distribution d.f. defined by Equation (|lj) below. Note that there 
is no categorization of data involved. The MD estimators then can be 
used in testing the goodness-of-fit hypothesis. We propose bootstrapping 
to approximate critical values for a goodness-of-fit test and then calculate 
approximate confidence intervals for the parameter estimates. Neither 
goodness-of-fit nor accuracy of estimates have previously been evaluated 
for S distribution estimations. We illustrate the new method with exam- 
ples of parameter inference using data generated from the S distribution. 
We conclude with a discussion of the uses of the S distribution in sta- 
tistical modeling problems. A computational realization of the proposed 



estimation procedure is described elsewhere (Aksenov et al. 2001) 



2 The S distribution 



The S distribution is defined in terms of its d.f. F{x) (Savageau, 1982), 
which is the solution of the following initial value problem (i.v.p.) for an 
ordinary differential equation (o.d.e.) 



(IF 



f(x) = ^- = a (F 9 - Fh ) - F (*o) = F ° (1) 

Note that the right-hand side of Equation (|l|) is also the probability func- 
tion (p.f.) f(x), which is thus an algebraic function of the d.f. The S 
distribution has four parameters 

e = (g,h,a,x ) (2) 

with xo for location, a for scale and g and h for shape. Often it is conve- 
nient to choose xo as a median, Fo = 0.5. 

The S distribution in Equation (Q) is completely specified by its four 
parameters 9. Conditions on the parameters, a > and g < h, ensure 
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that F(x) is a proper d.f., i.e. a monotone function of the random variable 
x with F( — oo) = and F(oo) = 1. 

We can obtain a simple condition on the parameters that provides for 
unimodality of the S distribution. Differentiating the p.f. in (Q) for x and 
equating to zero we obtain 

f = F h ~ 9 (3) 



One can immediately see that, given g and h, this equation has a real 
solution F (and hence mode x m ) if g and h are both positive, g > and 
> 0. Otherwise, the S distribution has a half- mode (i.e., is J-shaped). 
This follows from the left-hand side of Equation ([]) being negative if g and 
h have different signs, and greater than one if g and h are both negative. 
The right-hand side of Equation (^) has a positive real value between 
and 1 for any g < h. 

We can define skewness of the S distribution by evaluating the d.f. at 
the mode (Savageau, 1982): 



S = F{ Xm ) 



l/C>-9) 



(4) 



Now the S distribution is skewed to the right if the mode is less than the 
median, S < 0.5, is symmetrical if the mode coincides with the median, 
S — 0.5, and is skewed to the left if the mode is greater than the median, 
S > 0.5. For negative g, the S distribution is skewed to the right. 

Moments of the S distribution can be obtained numerically by inte- 
grating the p.f. 



fi' r = E(x r ) = a [ x (F 9 {x) - F h {x)) dx 



(5) 



simultaneously with the solution of the o.d.e. (Q). 

Quantiles of the S distribution can be obtained by using the fact that 
there is a monotone one-to-one relation between the random variable and 
the d.f. Thus, we can rewrite the o.d.e. (Ill) as 



dx 

IF 



i 



a Fa - F h ' 



x(F ) 



Xq 



(6) 



The solution of this equation can be obtained numerically. A closed form 
solution can be found in terms of element ary transcendental functio ns for 
a certain subclass of parameters g and h ( Voit an d_javageau . 1984 ) or in 
terms of Lerch's transcendent for all g,h £ K (Hernandez-Bermejo and 



Sorribat, 2001 ). This is done by separating variables in either o.d.e. (|l|) 



or (H) and integrating to obtain the following 



i(F) = x Q + - f 



dt 



ta - t h 



(7) 



Voit and Savageau ( 1984 ) solved the integral in Equation (]7|) in terms of 
elementary transcendental functions for g, h £ R, when g = (ha — l)/(cr — 
1) and a signed rational number u / 1, or for g, h £ R, when g < h = 1 
and cr=l. 



Hernandez-Bermejo and Sorribas (2001) represented the integrand in 
Equation (Q) as an infinite sum to obtain 



-i oo pF 

x(F) = x + j- I t k(h ~ 9) ~ 9 dt (8) 



a 

k=0 



The sum converges to a finite value if k(h — g) — g ^ — 1, which covers 
g, h 6 R when g<h<lorl<g<h such that g =fc (hk+l)/(k+l) for k G 
N. This condition defines a "generic" quantile solution. In the "generic" 
case, the quantile function is expressed in terms of Lerch's transcendent 
as follows: 

x = x + ,/ - ( F 1 -^ ( F h - 3 ,1, 1 



a(l -g) \ \ h- g 

^-^(^"M.Xrf)) ( 9 ) 



where Lerch 's transcendent &(z, s, v) is defined by the following series (Magmi! 
* |l966|): 



et al 



fc=0 ^ ' 

In the "nongeneric" case, which covers g, h € ffi when 1 < g < ft and 
<? = (Wc + l)/(fc + 1) for k g N, the integral in (^) produces a logarithmic 
term that has to be integrated separately. For (7 = 1, the logarithmic term 
is at k* = and so the quantile function is 

and for g > 1, the logarithmic term is at k* = (g — l)/(h — g) and the 
quantile function is 




ph(h—g)—g+l _ pk(h-g)-g + l 

fc(ft-3)"5+l 



(12) 



Note that the "nong eneric" solution is identical to the one found by |Voit 
and Savageavj ( 1984 ) for a — —k 



The explicit quantile function given by Equations (§), (0) and @ has 
several important consequences for both parameter inference and model- 
ing. First, it can be shown that liniF-»o+ x(F) — constant for g < 1 (the 
"generic" case). In other words, there is no infinite tail for the correspond- 
ing S distribution, which then becomes left-truncated. The truncation 
point can be calculated by letting F = in Equation (61) 

— -;&(*"•■'• £5) (13) 

Second, the existence of a finite x* calls for care when solving o.d.e. (jl|) 
numerically, because the solution of the o.d.e. is not unique at x* where 
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the solution F{x*) = joins the trivial solution F = 0. Nonuniqueness 
can be formally checked by showing that the Lipschitz condition is not 
satisfied at x* . Most o.d.e. solvers, whose algorithms assume uniqueness 
of the solution, will have trouble converging near x* . Third, an explicit 
expression for the mode of the S distribution, if it exists, is now possible 
by substituting (g) into Equations (g), ([H]) and (|l|): 

1 ( f g\0-9)/(h-a) ^ ( g , l-o 
* " x o + — -((r) 



a(l — g) \\h/ \h' 'h 



' "il> / J? h ~B I J_ 



1 / 1 /axVC-s) 
^' + a( v l0S ^(f) + 

(1 - Pf- 1 ) / (l - (i)^- 1 )/^-^' 



1 A 1 /0\l/(h-9) 

:E0+ a( 1Og ^(f) + 



(fc(h- 9 )-9 + l)/(h- 9 ) fc{h _ 9) _ 9+1 



^ \{h) - F " 



k(h-g)-g + l 



(14) 



Finally, the quantile function makes it easy to use a direct inversion 
method for the generation of random variates from the S distribution. 



3 Methods for parameter inference 



Existing techniques for estimating parameters of the S distribution, given 
a random samp le \x j \ i = 1, . . . , n, have been based on grap hical, nonlin - 



ear regression (Vbit, 1992), and maximum likelihood (ML) (Voit, 2000a) 



methods. The graphical method is relatively straightforward. As F — > 0+ 
the term F 9 dominates over the term F and the plot of In / vs. In F is a 
straight line In / = lna + gln.F with slope g and intercept a. Then, given 
F at the inflection point, which corresponds to the mode (if it exists), and 
the previously estimated g, one can estimate h from Equation 

Nonlinear regression can be accomplished with x vs. F of the o.d.e. 
or with an equivalent representation in terms of the algebraic equation 
/ vs. F. Regression of the algebraic equation seems to be faster and 
less numerically involved: however, estimating xo is not possible. During 
regression, the residual squared error (r.s.e.) is typically minimized and 
data is represented in a categorical f orm. In a recent example of regression- 
based estimation, Sorribas et al. (2000) propose to estimate xo by the 
sample median, to fix one of the parameters (e.g., a as the inverse of the 
sample standard deviation), and to fit the remaining pair (e.g., g and h) 
using the o.d.e. (^) and suitably categorized data. This procedure helps to 
avoid the algorithmic difficulties associated with fitting four parameters 
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simultaneously. Fixing some parameters and fitting the others exposed 
correlations between the "best-fit" parameters. For example, for a given 
random sample, fixing a at increasing values produced pairs of g and h 
where g was increasing and h was decreasing. In addition, one observed 
uncertainties in the "best-fit" parameters that resulted from optimization 
runs being initialized with different values, and from sampling variability. 
Drawing on the closed-form quantile function given by Equations (B), 



ll|) and (113) , Hernandez-Bermejo and Sorribas (2001) proposed to use 



least-squares fitting of the S distribution quantiles to the sample quan- 
tiles. Theoretical S distribution quantiles are evaluated at values of the 
empirical d.f. , whi c h is a step function with jumps at the data points. 



Recently, Voit (2000a) introduced a ML procedure to calculate esti- 
mates for g and h. After using the algebraic relationship between the p.f. 
and the d.f., the log-likelihood function is 

n 

logL(6>) = nloga + ^log (> '( Xi ;9) - F h (xi;9)^) (15) 

i=l 

Direct minimization of the function (|l^) involves numerical solution of the 
o.d.e. (Q) (unpublished computational realization by Voit and Schwacke; 
also by Sorribas, personal communication). However, Voit suggested an 
approximate ML method that requires only solution of nonlinear algebraic 
equations. First, he replaces the theoretical S distribution d.f. evaluated 
at data points F(xi;9) by the empirical d.f. F that is its consistent es- 
timate. Second, he introduces a constraint on parameters in the form of 
a fixed integral in the phase space dF/dx vs. F, which makes log L(9) a 
function of only g and h. Differentiating logL with respect to g and h 
and equating the derivatives to zero results in nonlinear equations for g 
and h that are then solved iteratively: 

g+l h + 1 n 

= _J 1 , Epilog --F 9 " h (^)) (16) 

h + 1 h — g n 

Note that the last point of the ordered sample causes a discontinuity 
because F(x n ) — 1 by definition. Equations (jlq) can still be solved nu- 
merically, provided one uses L'Hospital's rule to evaluate the last term of 
the sum: 

Um -iHI^- = lim fh . 1 = - J- (17) 

In summary, existing techniques for S distribution parameter estima- 
tion tend to use categorized data and generally lack goodness-of-fit infor- 
mation. These deficiencies motivated us to develop MD estimators. 



4 Minimum distance estimators 



The MD method was introduced by Wolfowitz (1957) and since then has 
proved to be a convenient method for strongly consistent parameter es- 
timation. The idea of the method is to match the empirical d.f. to a 
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theoretical one as closely as possible, using a distance function S(-, ■). In 
the context of the S distribution we have the d.f. (|l|) defined on 9 G Q 
where O is the following subset of R 4 : 

9 := {6> : a > 0, g < h, h £ R, x £ R, C(9, Xi) < 0} (18) 

where the nonlinear constraint function C is defined as 

C = x - njin 4= i,...,„(s i ) + f° 9 , $ (f^~ 9 , 1, (19) 

The nonlinear constraint (jll]) is essential for estimation because it en- 
sures that the S distribution is consistent with the data at all times, and 
at the optimum vector 9 in particular. This means that the truncation 
point, which is finite, is less than the minimum observed data point and 
thus that the d.f. is defined at all data points. Of course this does not 
ensure against the possibility that an even lower data point just has not 
been observed and thus that the true population should be truncated at 
an even lower value of x, or not truncated at all. Note that evaluation 
of the constraint function C depends on an accurate and fast method 
for calculation of Lerch's transcendent <&(z,s,v). This is n ow possible 



with recent advances in convergence acceleration techniqu es (Aksenov et 
"a!] , jgoj jjentschura et al.| , |l999 ; [lentschura et J] , [200 1| ). The d.f. is 
calculated at Xi using Equation (11) as long as the desired solution is not 
too close to the truncation point x*. In the immediate proximity of a;* 
one can solve the nonlinear (transcendental) equation (^) for F. 
The empirical d.f. is defined as a step function 



p = #{ Xi <x) 
n 

where #(■) signifies the number of Xi less than or equal to x and weight 
1/n is put on each point; if there are tied observations, proportionately 
more weight is put on the unique points. 

Given the distance metric 5(F,F), a MD estimator 6 is given by the 
solution of the following equation 

5(F(xi), F(xi-, §)) = ud ee eS(F(xi), F(x t ; 9)) (21) 

As the distance metric S, we consider here four goodness-of-fit statis- 
tics of the supremum (Kolmogorov-Smi rnov and Kuiper) and quadrati c 



(Cramer- von Mises and Watson) types ( D'Agostino and Stephens , 1981: ) . 
This allows us to combine estimation with testing the validity of fit. The 
Kolmogorov-Smirnov statistic D is the largest unsigned vertical distance 
between F and F, the Kuiper statistic V is the sum of the largest signed 
vertical distances, the Cramer-von Mises statistic W 2 is the integral of 
the squared differences between F and F, and the Watson statistic U 2 is 
a modified version of W 2 : 

D = sup x \F(x) - F(x)\ 

V = sup x (f(x) - F(xj) + sup x (P(x) - F(xj) 



W 2 = n [ (P(x) - F(x)Y dF(x) 

J — OC 
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XT 



j — ( 



F(x)-F(x)- 



F(x) - F{x)) dF[x)\ dF(x) 



(22) 



For the one-sample problem that we are dealin g w ith, computational for- 
mulas for the statistics can be derived from ( |22[ ) using the probability 
integral transformation z = F(x), where z is uniformly distributed, and 
letting sn = F(xi): 



D = 
V = 
W 2 = 



max I max<=i, 



Zi | , maxi=i, 

n 



i - 1 



, i \ i-l 

maxi = i ...,„ Zi) +maxj = i...., n Zi 



1 / 2i - 1 

T*7 + E * 



2h 



[/ 2 = K/ 2 -n(^^--0.5 



(23) 



The MD estimators 6 obtained as a solution of Equations (|l8|), ( p^[ ) 
and (|^) and with metrics 5 ( ^ ) can be used in testing the goodness-of- 
fit. Formally, we wish to test the composite hypothesis that the random 
sample {a;;} comes from the S distribution: 



Ho-.FeT 



(24) 



against general alternatives, where T is the class of the S distribution d.f.s 



T ~ {F(-,6) : 6 G e> 



(25) 



Now given parameter estimates 6, one calculates the empirical d.f.- 
based goodness-of-fit statistic 3 (^) and compares it with the critical point 
corresponding to a specified significance level, typically 0.01 or 0.05. For 
the so-called case 0, when the distribution F(x) is completely specified, 
asymptotic distributions of go odness-of - fit statistics are k nown and critical 
points have been tabulated (Stephens, 



197C 



Stephens , 1974) 



However, 

in the general case when parameters are estimated from data, distribu- 
tions of statistics have to be approximated. A relatively straightforward 
though computationally-intensive way of obtaining the critical points is 
to approximate sampling distributions of goodness-of-fit statistics by the 
bootstrap method. The asymptotic validity of the bootstr ap method for 



MD goodness-of-fit tests was estab lished in (Beran, 1986) and for more 



general problems in (Romano, 1988). With the bootstrap method, one can 



also calculate approximate confidence intervals for parameter estimates. 
The M D es t imators have been show n to have an asymptotic distribu- 



tion (Sahlei 



197C; Bolthausen 



1977) 



T he bo otstrapping algorithm is as outlined in (Efron and Tibshi 



1993 ). One samples B times with replacement, either from an em- 
pirical d.f. (EB) of the sample (in a nonparametric mode) or from the 



8 



parametric model with parameters 8 (in a parametric mode), and calcu- 
lates parameter estimates and goodness-of-fit statistics exactly the same 
way as with the original sample. These are now called bootstrap replica- 
tions 8* and 5*. The lower and upper critical values for a goodness-of- 
fit statistic S, corresponding to a significance level a < 0.5, are then the 
k = [(B+l)a/2]th and (B+l — fc)th largest values of the ordered bootstrap 
replications 5*, respectively, where [•] signifies taking the integer part. 
The observed statistic value 5 is then compared with the critical values. 
Comparison with the lower critical value ensures against the so ca l led su - 
peruniformity when the statistic takes too small a value ( [Stephensl |l97c| ). 



Equivalently, one can calculate the achieved significance level (a.s.l.) of 
a statistic 8 that is simply the empirical quantile based on an ordered 
sample of replications S* . The a.s.l. is then compared with the specified 
significance level of the test. 

To obtain a (1 — a) 100% equitailed bootstrap-percentile confidence 
interval for the parameter estimates, bootstrap replications 8* are ordered, 
and lower 8* a and upper #* p endpoints of the interval are again estimated 
by the k = [(B + l)a/2]th and (B + l — fc)th largest values. However, 
bootstrap-percentile intervals can have substantial coverage error as shown 
in the following equation 

P(8 lo <8< 8 np ) = 1 - a + 0(f(n)) (26) 

The bootstrap-percentile method is first-order accurate in that the rate 
with which the coverage error goes to zero is f(n) — n -1 / 2 , as the sample 
size goes to infinity. To improve the a ccuracy, the BCa (bias-corrected 



and accelerated) method was proposed ( Efron , 1987 ). In this method, the 
endpoints 0* o and #* p are calculated as empirical Qith and a^th quantiles, 
respectively, 

Ql = $ [ ZQ + 



i(io + W J ) 

- = (27) 

where $ is the standard Normal d.f. and z a is the crth quantile of the 
standard Normal distribution, i.e. <&(z a ) = a. The bias-correction con- 
stant zq is obtained as the proportion of bootstrapped replications that 
are less than the observed value, 

fc = «-i [ # {§ * B <S} ) (28) 

where $ _1 is the Normal quantile function (i.e., the inverse of the d.f.). 
The acceleration connstant a can be estimated in terms of the jackknife 
values 9u) (8s estimated from the sample omitting the ith point): 

a = i / -^ w (29) 

6 fcufeu *<.-)/» -V 2 
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The BCa intervals are second-order accurate in that the coverage error 
goes to zero with rate f(n) 



■ n 



-i 



Like all bootstrap estimates, the confidence interval endpoints have 
variance that is due to sampling error and bootstrap resampling error. We 
can estimate the variance of endpoints (which are sample quantiles) using 
the jackknife-after-bootstrap method in which for each ith data point 
from the ordered sample, one groups bootstrap resamples that do not 
contain that particular point and calculates confidence inte rval endpoint s 



exactly as above over that collection of resamples, Obu) ( Efron , 1992 ). 
The estimate of variance is then 

^) = ^E^,-^) 2 (30) 

One can use Equation (|3^) to decide if a given number of resamples B is 
satisfactory by calculating the coefficient of variation, cv = var 1 / 2 (#)/#, 
as a function of B and choosing a threshold for cv, say 0.1 (which means 
we are unwilling to accept more than 10% of contribution of Monte Carlo 
error to the estimate). 

The BCa intervals can be quite expensive to calculate, esp ecially taking 
into account the optimization step in Equations dlj|), ( |l9|) and (^l|). In 
general, on the order of 1000 resa mples might be commonly needed to 



achieve a small Monte Carlo error (Efron, 1987) 



However, one can focus on coverage accuracy of the bootstrap ap- 
proximation and, instead of accumulating more bootstrap resamples to 
reduce the error, use the number of resamples as a calibration parameter 

to achieve a specified coverage; Su ch an approach leads to the extreme 

bootstrap percetiles method (Lee, 200C| ). As a first step in construct- 



ing the equitailed percentile interval of nominal coverage (1 — a) 100%, 
one solves the following equations for the minimum required number of 
resamples B 

1 ab 3 
«' 2 = ~B~+T + ~B 

1 ab 3 , , 

Q / 2 = B~+i ~ ~~B < 81 > 

where 6 is the positive solution of equation 

B<j>(b - fT 1 ) = b (32) 

and <f> is the standard Normal p.f. The maximum of the two solutions for 
Equations ( |3l| ) and ( [32] ) is then the derived minimum number of resam- 
ples. Equations ( |3l| ) and (^) are obtained from asymptotic expansions 
of the extreme coverage associated with the bootstrap-percentile method, 
and assume the validity of Edgeworth expansions for the bootstrap distri- 
butions of the standardized bootstrapped statistic and the smooth model 
for the statistic as a function of the mean. The validity of these equa- 
tions is thought however to extend to more general statistical function- 



al (Lee, 2000) and thus they are likely to be applicable here. The cov- 



erage error of the extreme percentiles intervals goes to zero with rate 
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f(n) 



-V2 W l/2 



log ' n, which is slower than with the BCa intervals, the 



extreme percentile intervals can however provide a large reduction of com- 
putational effort because the minimum required B will typically be much 
less than 1000. As with all bootstrap estimates, it is worth looking at 
histograms of replications; highly skewed histograms are indicative of in- 
accurate estimates of the tails of the bootstrap distributions and of the 
requirement for more simulation effort. Also note that the confidence in- 
tervals here are the marginal intervals for the parameters 6, constructed 
from a multivariate empirical bootstrap distribution. Construction of the 
simultaneous confidence regions would be rather awkward given the di- 
mension of 6. Confidence regions for two-parameter location and scale 
fa milies based on th e Kolmogorov-Smirnov statistic have been considered 



in ([Easterlinj, |l97C|; 



Littel and Rac, 



1970 



5 Example: inference from the S distri- 
bution data 

Here we apply the MD estimators, derived using the above procedure, 
to parameter inference for a random sample generated from a specified S 
distribution. We use a computational realizatio n of this procedure that 



is a collection of Mathematica and C programs (Aksenov et al, 2001). A 
Mathematica notebook documenting all the calculation in this section is 
available from the corresponding author. 

We generate a random sample of size n — 100 from an S distribu- 
tion with the parameters in (JiJ) given by 8 = (0.5, 1.6, 1.0,0.0). For the 
particular random sample used here, the seed for the Mathematica ran- 
dom number generator was 11235. This distribution is left-truncated with 
truncation point x* = —1.70745 (see Equation (|l^)) and unimodal with 
the mode at x m = —0.38601 (see Equation (fl4|)). 

As a first attempt we estimate parameters using a combination of ex- 
isting methods: let xo be the sample median, a be the inverse of the 
standard deviation of the data and g and h be the approximate maximum 
likelihood estimates calculated using Equations ( pi) ) . These estimates and 
the four goodness-of-fit statistics (E3J) are shown in the third column of 
Table ^. We do no t adj ust & and xo to have better agreement with the 



data as advised in ( Voit , 2000c ) since these estimates serve only as initial 
guesses for the optimization. These initial estimates are reasonably close 
to the population parameters for this particular sample, but of course we 
are more interested in how the estimators behave in the long run when 
applied to other samples from the same population. The goodness-of- 
fit statistics calculated with these estimates are much larger than those 
calculated with the true parameters, but again we do not know how re- 
producible this diffrence is in the long run. 

The MD estimates obtained by usin gEquations @, @ and @ 
with each of the four distance metrics (B3j) are shown in the last four 
columns of Table The values of the goodness-of-fit statistics are sub- 
stantially lower than those calculated with the first estimates or with the 
true parameters. We are now ready for evaluating the goodness-of-fit with 
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the bootstrap method. 

The equitailed extreme-percentile confidence intervals with intended 
coverage of 95% for MD estimators with the four goodness-of-fit statistics 
are shown in Table ^| According to Equations ( pl[ ) and ( |32[ ) with a — 
0.05, only 39 bootstrap resamples were needed for the approximation. We 
performed calculations with nonparametric resampling from the empirical 
d.f. and parametric resampling from the S distribution d.f. with MD 
estimates 9 obtained using the corresponding goodness-of-fit statistics. 
Both nonparametric and parametric intervals have similar lengths and 
shapes (data not shown). Note that the MD estimates based on the 
different goodness-of-fit statistics have intervals of different lengths. For 
example, the quadratic statistics W 2 and U 2 give intervals for h and a 
that are wider than those provided by the supremum statistics D and V. 
Also, all estimators have the true values inside the intervals, except for the 
parametric interval with Kuiper for a, which indicates a possible bias for 
this estimator. Observed values of the goodness-of-fit statistics are within 
their respective 95% intervals, indicating that the null hypothesis ([mJ) 
cannot be rejected at the 0.05 significance level. We note that the a.s.l.s 
for nonparametric bootstraping with supremum statistics are somewhat 
lower than those for parametric bootstrapping, making the nonparametric 
tests conservative. This observation is reversed for the quadratic statistics. 

An alternative way to calculate approximate confidence intervals is 
with the BCa method. Table |§] shows 95% equitailed intervals obtained 
with 4000 parametric and nonparametric bootstrap resamples by using 
Equations (|27|), ( p8[ ) and (p9|). Note again that the supremum statistics 
are more conservative for nonparametric than for parametric tests. The 
availablility of 100 times more resamples than with the extreme-percentile 
method permits more thorough investigation. For example, bootstrap 
distributions of estimates differ radically for the two types of functionals 
(data not shown). For supremum functions D and V, distributions of all 
estimates are more or less symmetric with moderate tails. In contrast, 
for quadratic functions W 2 and U 2 , distributions are highly skewed. This 
leads to wide intervals for h and a. Curiously, distributions for g are 
bimodal, indicating the presence of at least two local minima in the opti- 
mization problem involving quadratic functions. Distributions for xo are 
nearly symmetrical in all cases. High skeweness of distributions is accom- 
panied by high variability of bootstrap estimates for the endpoints of the 
confidence intervals. The overall variability is expected to settle down at 
the level of sampling variability with increasing number of resamples B. 
This indeed happens for the upper endpoints of the estimation based on 
the Kolmogorov-Smirnov distance function, but not for any of the lower 
endpoints, and the pattern is more erratic as we move to the estimates 
based on the quadratic functions (data not shown). 

Plots of the empirical d.f. and the "best-fit" S distribution d.f. are 
shown in Figure |l|. While all four MD estimators give visually good ap- 
proximations, which is also evident from the close agreement among the 
estimates and the population values, the properties of the estimators are 
strikingly different as discussed above. 
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6 Existing applications of the S distribu- 
tion in statistical modeling and suggested 
extensions 

The ability of the S distribution to approximate diverse distributional 
forms suggests its use in various stochastic models that give rise to uni- 
variate unimodal distributions. Methods of parameter inference for the S 
distribution, including the MD estimators proposed in this article, are of 
course critical to any data-based modeling of this sort. 

One of the main application areas is Monte Carlo modeling. Specifi- 
cally, one might be interested in repeated sampling from an S distribuion 
that is the best numerical model for random data. Apart from parameter 
estimation, efficient random number generation from an S distribution 
is required. An exa mple of S distribution mo deling in risk assessment 



studies is provided by Voit and Schwacke (200C). They also described an 



approximate method to sample from an S distribution, by interpolating 
among tabulated S quantiles with a rational function. An exact method 
is to use invers ion with the quantile functions g iven by Equations (g), 
dill) and ( |l2| ) (Hernandez-Bermejo and Sorribas, 2001). In risk analy- 



sis applications, the parameters of risk models are often uncertain and 
it is desirable to investigate the sensitivity of risks to these parameters. 
A fundamental approach is to assign an S distribution to parameters of 
the risk model and simulate it many times in an attempt to evaluate 
the statistcal uncertainties in the risk as a function of the input distri- 
bution of the parameters. A similar application of S distributions can be 
found for hierarchical Monte Carlo simulations in environmental assess- 
ment, where distributions of several parameters are conditioned on each 
other in a hierarchical fashion. In the analysis of mercury contamination in 
king mackerel, this made it possible to obtain contaminant concentrations 



terdependency between model parameters (Voit et al 



tions that i 


gnore statistical in 


(Voit et al. 


, 1995). 



Monte Carlo simulations of a quite different nature can be found in the 



numerical app l ication of mathematically controlled comparisons (Alvcs 



2000c) 



1991 



Savageau 




1972 





Irvine 



Hlavacck and Savageau] , 199S ) is a technique to study in quanti 



tative terms models of complex biological networks with alternative de- 
signs. Well-worked applications of this techn ique led t o the discovery of 



works (pavageau 




1974 




Savageau 




2001 


)• 


and Savageau 




1985aJ pTvine and Savageau 



savageau, 1972), gen e net 



meth od is based on S-systems within the power-law formalism (Savageau, 
19761 ), which leads to models of alternative designs that are often amenable 
to analytical solution and general conclusions. The numerical exten- 
sion is motivated by the need to eliminate some uncertainties associ- 
ated with the classical analytical approach (e.g., numerical comparison 
of alternatives that depends on specific parameter values), and also by 
the need to address more complicated situations when power-law models 
are intractable analytically such as models of elementary signal transduc- 
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tion modules based on covalent modification of proteins. In these cases, 
the idea is to sample parameters of such models from their (generally 
unknown) distributions and to evaluate statistical properties of model 
output properties, like steady-state levels of variables and logarithmic 
gains. Although parameters were sampled from uniform distributions 
in rec ent applications of numerical mathematically c o ntrolled compar- 
isons ( Alv es and Savagcar , 2000b ; Alv e s and Savageau, 2000c; Alves and 



Savagealt " 2000d ; Alves ant Savageau, 2001), sampling parameters from 



suitably chosen S distributions will be more appropriate in situations when 
the distributions are clearly not uniform. 

Another application of the extroadinary flexibility of the S distribution 
in modeling various data structures is the study of distributional trends, 
which allows one to make inferences about the dynamics of probabilis- 
tic models of some stochastic processes. Such mode ls often include both 
stochastic and deterministic components (Volt, 1996). An example of such 
an approa ch is the analysis of tren ds in the distributions of tree sizes with 
their age (Voit and Sorribas, 2000). Observations show that the distribu- 
tion of tree trunk diameters changes with their age, even as radically as 
reversing the skewness. By combining a deterministic component of the 
process (growth function of a tree) with a stochastic one (distribution of 
tree trunk diameter in the population) these authors were able to predict 
the change in distributional shape as a function of age, w hich is in good 
agreem ent with the observed data. Along the same lines, sorribas et al 
(200C) considered growth trends in children (e.g. weight) with the mod- 
ification that the trends of distributional parameters were established by 
regression, rather than from a deterministic model. Similar technique also 



was used in (Voit et al 



1995; Balthis et al 



1996). 



7 Discussion 

In this article we addressed an important issue in statistical modeling, that 
of statistical inference about a distribution. The S distribution, which has 
been demonstrated to be a highly useful tool for various kinds of statist- 
cal modeling, so far has lacked an estimator that is relatively straightfor- 
ward, that makes a minimum reduction of information in the data, and 
that is amenable for goodness-of-fit analyses. The MD estimators that we 
propose fill this gap. Several features of the MD estimators make them 
viable alternatives to other methods. Under some regularity conditions, 



MD estimators are strongly consistent (Sahler, 1970). This result applies 
in particular to the supremum and quadratic distance functions consid- 
ered in this article, and to the S distribution which has a continuous d.f. 
Also, MD estimators are invariant with respect to transformation of the 
estimand, a property that they share with ML estimators. Finally, the fol- 
lowing feature makes MD estimators especially relevant in the context of 
the S distribution. When the hypothesized model does not belong to the 
class of parameterized d.f. models that generate the sample (i.e., when the 
model is wrong), MD estimators provide the "best approximation" from 
the class of models (|5f]). This is fully in the spirit of the S distribution be- 
ing the best approximating distribution of the unknown population. This 
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feature is not shared by ot h er est imation methods, including ML and the 
method of moments (Pan, 1981). Also, MD estimators are natural can- 
didates for use in goodness-of-fit tests if the dista n ce fun ction is used as 
a goodness-of-fit statistic, as noted by Bolthausen (1977). 

Given the numerical nature of the S distribution and the estimation 
method, large-scale Monte Carlo siumulations will be needed to establish 
properties of the estimator and the power of different distance metrics 
relative to other estimation methods, maximum likelihood in particular. 
However, the example we have given of inference from a random sample 
generated from an S distribution shows the utility of the new estima- 
tor. First, BCa bootstrap confidence intervals seem to require many more 
resamples than is thought appropriate for a general case: even the 4000 
resamples reported here are clearly not enough to reduce the variance of es- 
timates of the confidence intervals endpoints for all parameters. Variance 
estimated with jackknife-after-bootstrap is even more erratic for quadratic 
than for supremum statistics. Second, bootstrap distributions of MD es- 
timates are highly skewed for all functions except Kolmogorov-Smirnov, 
which makes it the only one to have confidence intervals of reasonable 
length and shape and to be overall more trustworthy. This observation 
goes along with the finding that consonance sets for location and scale pa- 
rameters based on the Kolmogorov-Smirno v stati s tic ha ve some desirable 
properties, i.e. they are finite and convex (Salvia, 1980). Reasons for the 
rather erratic behavior of the quadratic distance functions seem related 
to the fact that they are generally more sensitive to deviations from the 
model than supremum functionals. During bootstrapping, variability of 
resamples is more pronounced in the tails of the bootstrap distributions 
and that is where the estimation of confidence interval endpoints (as sam- 
ple quantiles) takes place. Thus the variability of these estimates should 
be greater with the quadratic functionals. For these reasons, the quadratic 
functionals in the context of S distributions seem to be less robust for es- 
timation purposes, in contrast to what was found for more traditiona l 
location-scale families in Monte Carlo studies ( Parr and Schucany , 198C). 
Finally, expensive BCa intervals can be replaced by at least ten-fold less 
expensive extreme percentiles intervals, with little loss of accuracy. 

We have shown that MD estimation coupled with bootstrap analysis of 
goodness-of-fit makes the S distribution a valuable tool for various kinds 
of Monte Carlo statistical modeling. 
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Parameter Population Combined estimate Minimum distance estimate 









KS 


KP 


CVM 


Wat 


9 


0.5 


0.810135 


0.516187 


0.504587 


0.628322 


0.633029 


h 


1.6 


1.40772 


1.49055 


1.36926 


1.57269 


1.62646 


a 


1.0 


0.923027 


1.18409 


1.25795 


1.29237 


1.25799 


X 


0.0 


0.0212869 


-0.00788733 


0.0313802 


0.0226832 


0.0117285 


D 


0.0565035 


0.233542 


0.0398301 








V 


0.0939742 


0.410878 




0.0790387 






W 2 


0.0338124 


1.76827 






0.0197829 




u 2 


0.0324051 


1.74806 








0.0197229 



Table 1: Parameter estimates from an S distribution sample data (n = 100). Estimates obtained by a combination of existing 
methods (Combined estimate) and by minimizing KS (Kolmogorov-Smirnov), Kuiper (KP), Cramer-von Mises (CVM) or 
Watson (Wat) distance metrics (Minimum distance estimate). See text for details of calculation. 



Parameter 



Confidence interval 
KS 



KP 



CVM 



Wat 



a 



x 



D 



V 



w 



u 2 



(0.467339, 0.818564)* 
(0.271795, 0.846612)t 
(1.329, 2.17525) 
(0.975641, 2.01904) 
(0.882613, 1.56611) 
(0.946866, 1.551) 
(-0.146193, 0.156423) 
(-0.199951, 0.261195) 
(0.0361164, 0.0718024) 
p = 0.846154 
(0.0241676, 0.0632652) 
p = 0.641026 



(0.40623, 0.952086) 
(0.289727, 0.935185) 
(1.2042, 2.4769) 
(1.13551, 2.1405) 
(0.650582, 1.65693) 
(1.02877, 1.56108) 
(-0.33061, 0.258266) 
(-0.27011, 0.250137) 



(0.0668382, 0.130067) 
p = 0.871795 
(0.0533178, 0.108985) 
p = 0.512821 



(0.179366, 
(0.208938, 
(0.943656, 
(0.814594, 
(0.404169, 
(0.503189, 
(-0.178406 
(-0.157073 



0.96643) 
0.988865) 
7.29396) 
3.64494) 
6.02715) 
13.1778) 
, 0.159036) 
, 0.262181) 



(0.0108397, 0.0625526) 
p = 0.615385 
(0.012497, 0.0689029) 
p = 0.769231 



(0.0777369, 1.01187) 
(0.203831, 1.15758) 
(0.869806, 7.98307) 
(0.869556, 22.2877) 
(0.398919, 20.7098) 
(0.449198, 9.42905) 
(-0.258809, 0.164895) 
(-0.235997, 0.271462) 



(0.0120847, 0.0504255) 
p = 0.666667 
(0.0112161, 0.0971751) 
p = 0.717949 



*Top entries calculated with nonparametric bootstrap 
6 tBottom entries calculated with parametric bootstrap 



Table 2: Equitailed extreme-percentile bootstrap confidence intervals. Intended coverage of 95% for minimum distance pa- 
rameter estimates from sampled data (n = 100) obtained from the S distribution with parameters g — 0.5, h = 1.6, a = 1.0, 
xo = 0.0. Methods include KS (Kolmogorov-Smirnov), Kuiper (KP), Cramcr-von Mises (CVM) and Watson (Wat) goodness- 
of-fit statistics. See text for details of calculation. 



Parameter 



Confidence interval 
KS 



KP 



CVM 



Wat 



a 



x 



D 



V 



w 2 



u 2 



(0.296367, 0.681915)* 
(0.259294, 0.70718) t 
(1.13667, 1.79035) 
(1.14481, 1.94514) 
(0.835117, 1.35433) 
(0.846121, 1.31254) 
(-0.196492, 0.151388) 
(-0.216363, 0.225872) 
(0.0282195, 0.0448809) 
p = 0.8845 
(0.02683, 0.0531105) 
p = 0.634 



(0.204395, 

(0.274027, 

(0.981214, 

(0.960117, 

(0.914392, 

(0.908877, 

(-0.188277, 

(-0.182983. 



0.693703) 
0.846456) 
1.65894) 
1.90602) 
1.63598) 
1.63787) 
0.280725) 
0.386315) 



(0.0511786, 0.0907438) 
p = 0.86125 
(0.0572763, 0.111431) 
p = 0.55325 



(0.0257174, 0.888425) 
(0.104614, 0.959833) 
(1.01331, 6.36995) 
(0.915669, 5.68933) 
(0.331079, 5.24991) 
(0.354572, 7.13847) 
(-0.133676, 0.196232) 
(-0.190276, 0.251492) 



(0.00993233, 0.0356645) 
p = 0.64 

(0.00744214, 0.0311884) 
p = 0.81 



(0.00721408, 0.955644) 
(0.0916488, 1.04173) 
(0.96578, 10.1783) 
(0.885967, 6.62075) 
(0.314168, 4.48109) 
(0.353748, 7.24277) 
(-0.173835, 0.181506) 
(-0.202019, 0.278002) 



(0.00989892, 0.037251) 
p = 0.59575 

(0.00707178, 0.0322339) 
p = 0.7805 



*Top entries calculated with nonparametric bootstrap 
^Bottom entries calculated with parametric bootstrap 



Table 3: Equitailed BCa percentile bootstrap confidence intervals. Intended coverage of 95% for minimum distance parameter 
estimates from sampled data (n = 100) obtained from the S distribution with parameters g = 0.5, h = 1.6, a = 1.0, 
xo = 0.0. Methods include KS (Kolmogorov-Smirnov), Kuiper (KP), Cramcr-von Mises (CVM) and Watson (Wat) goodness- 
of-fit statistics. See text for details of calculation. 




Figure 1: Empirical (dots) and theoretical S distribution (lines) d.f.s calculated 
with the MD estimates. See Table [l] for population and estimated parameters. 



22 



